Dataset summary: Dengue

Report generated using dataprep.

Dengue dataset report

Overview

Dataset Statistics

Number of Variables 14
Number of Rows 124897
Missing Cells 513559
Missing Cells (%) 29.4%
Duplicate Rows 63027
Duplicate Rows (%) 50.5%
Total Size in Memory 46.3 MB
Average Row Size in Memory 388.9 B

Variable Types

Categorical 9
Numerical 5

Variables

abdominal_pain

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 13921
Missing (%) 11.2%
Memory Size 7.4 MB

Length

Mean 4.7556
Standard Deviation 0.4297
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 527762
Lowercase Letter 416786
Space Separator 0
Uppercase Letter 110976
Dash Punctuation 0
Decimal Number 0

age

numerical

Distinct Count 133
Unique (%) 0.1%
Missing 213
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.9 MB
Mean 10.7163
Minimum 0
Maximum 88
Zeros 33
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 0
5-th Percentile 2
Q1 6
Median 10
Q3 13
95-th Percentile 26
Maximum 88
Range 88
IQR 7

Descriptive Statistics

Mean 10.7163
Standard Deviation 7.2255
Variance 52.2085
Sum 1.3361e+06
Skewness 2.4783
Kurtosis 11.9615
Coefficient of Variation 0.6743

ascites

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 50783
Missing (%) 40.7%
Memory Size 4.9 MB

Length

Mean 4.9615
Standard Deviation 0.1924
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 367716
Lowercase Letter 293602
Space Separator 0
Uppercase Letter 74114
Dash Punctuation 0
Decimal Number 0

bleeding_gum

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 50751
Missing (%) 40.6%
Memory Size 4.9 MB

Length

Mean 4.9652
Standard Deviation 0.1833
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 368148
Lowercase Letter 294002
Space Separator 0
Uppercase Letter 74146
Dash Punctuation 0
Decimal Number 0

bleeding_mucosal

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 23383
Missing (%) 18.7%
Memory Size 6.8 MB

Length

Mean 4.8933
Standard Deviation 0.3088
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 496736
Lowercase Letter 395222
Space Separator 0
Uppercase Letter 101514
Dash Punctuation 0
Decimal Number 0

bleeding_skin

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 57870
Missing (%) 46.3%
Memory Size 4.5 MB

Length

Mean 4.7434
Standard Deviation 0.4368
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 317933
Lowercase Letter 250906
Space Separator 0
Uppercase Letter 67027
Dash Punctuation 0
Decimal Number 0

body_temperature

numerical

Distinct Count 529
Unique (%) 2.0%
Missing 97812
Missing (%) 78.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 423.2 KB
Mean 37.7906
Minimum 35
Maximum 41.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 35
5-th Percentile 37
Q1 37
Median 37.5
Q3 38.5
95-th Percentile 39.5
Maximum 41.5
Range 6.5
IQR 1.5

Descriptive Statistics

Mean 37.7906
Standard Deviation 0.9069
Variance 0.8224
Sum 1.0236e+06
Skewness 0.9686
Kurtosis 0.01539
Coefficient of Variation 0.024

gender

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 173
Missing (%) 0.1%
Memory Size 8.3 MB

Length

Mean 4.9238
Standard Deviation 0.9971
Median 4
Minimum 4
Maximum 6

Sample

1st row Female
2nd row Female
3rd row Female
4th row Female
5th row Female

Letter

Count 614116
Lowercase Letter 489392
Space Separator 0
Uppercase Letter 124724
Dash Punctuation 0
Decimal Number 0

haematocrit_percent

numerical

Distinct Count 1060
Unique (%) 2.3%
Missing 77905
Missing (%) 62.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 734.2 KB
Mean 39.862
Minimum 1
Maximum 74.4
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 1
5-th Percentile 32.8
Q1 36.7
Median 39.4
Q3 42.5
95-th Percentile 49
Maximum 74.4
Range 73.4
IQR 5.8

Descriptive Statistics

Mean 39.862
Standard Deviation 4.93
Variance 24.3053
Sum 1.8732e+06
Skewness 0.5428
Kurtosis 1.522
Coefficient of Variation 0.1237

plt

numerical

Distinct Count 2333
Unique (%) 5.0%
Missing 78088
Missing (%) 62.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 731.4 KB
Mean 5484.7121
Minimum 1
Maximum 268768.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 1
5-th Percentile 27
Q1 83
Median 159
Q3 257
95-th Percentile 50050
Maximum 268768.5
Range 268767.5
IQR 174

Descriptive Statistics

Mean 5484.7121
Standard Deviation 21859.904
Variance 4.7786e+08
Sum 2.5673e+08
Skewness 4.9072
Kurtosis 27.284
Coefficient of Variation 3.9856

weight

numerical

Distinct Count 324
Unique (%) 0.3%
Missing 545
Missing (%) 0.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.9 MB
Mean 32.1729
Minimum 7.2
Maximum 114
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 7.2
5-th Percentile 13
Q1 21
Median 30
Q3 42
95-th Percentile 58
Maximum 114
Range 106.8
IQR 21

Descriptive Statistics

Mean 32.1729
Standard Deviation 14.3309
Variance 205.3754
Sum 4.0008e+06
Skewness 0.695
Kurtosis 0.2718
Coefficient of Variation 0.4454

bleeding

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 61556
Missing (%) 49.3%
Memory Size 4.2 MB

Length

Mean 4.5694
Standard Deviation 0.4952
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 289431
Lowercase Letter 226090
Space Separator 0
Uppercase Letter 63341
Dash Punctuation 0
Decimal Number 0

shock

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 559
Missing (%) 0.4%
Memory Size 8.3 MB

Length

Mean 4.9413
Standard Deviation 0.2351
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 614387
Lowercase Letter 490049
Space Separator 0
Uppercase Letter 124338
Dash Punctuation 0
Decimal Number 0

dsource

categorical

Distinct Count 10
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 8.1 MB

Length

Mean 3.0035
Standard Deviation 1.0155
Median 2
Minimum 2
Maximum 5

Sample

1st row 01nva
2nd row 01nva
3rd row 01nva
4th row 01nva
5th row 01nva

Letter

Count 250201
Lowercase Letter 250201
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 124924

Interactions

Correlations

Missing Values



 8
 9
10
11
12
13
14
15
16
17
18
19
20
 from dataprep.eda import create_report
 from pkgname.utils.data_loader import load_dengue
 from pkgname.utils.print_utils import suppress_stdout, suppress_stderr

 features = ["dsource", "age", "gender", "weight", "bleeding", "plt",
             "shock", "haematocrit_percent", "bleeding_gum", "abdominal_pain",
             "ascites", "bleeding_mucosal", "bleeding_skin", "body_temperature"]

 with suppress_stdout() and suppress_stderr():
     df = load_dengue(usecols=features)
     report = create_report(df, title="Dengue dataset report")

 report

Total running time of the script: ( 0 minutes 4.891 seconds)

Gallery generated by Sphinx-Gallery